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Abstract. We prove a general ergodic-theoretic result concerning the return time 
statistic, which, properly understood, sheds some new light on the common sense 
phenomenon known as the law of series. Let {V 1 ,^, cr) be an ergodic process on 
finitely many states, with positive entropy. We show that the distribution function 
of the normalized waiting time for the first visit to a small cylinder set B is, for ma- 
jority of such cylinders and up to epsilon, dominated by the exponential distribution 
function 1 — e~ *. This fact has the following interpretation: The occurrences of such 
a "rare event" B can deviate from purely random in only one direction - so that for 
any length of an "observation period" of time, the first occurrence of B "attracts" 
its further repetitions in this period. 



Note 

This paper resulted from studying asymptotic laws for return/hitting time statis- 
tics in stationary processes, a field in ergodic theory rapidly developing in the recent 
years (see e.g. [A-G], [C], [C-K], [D-M], [H-L-V], [L] and the reference therein) . Our 
result significantly contributes to this area due to both its generality and strength 
of the assertion. After having completely written the proof, during a free-minded 
discussion, the authors have discovered an astonishing interpretation of the result, 
clear even in terms of the common sense understanding of random processes. The 
organization of the paper is aimed to emphasise this discovery. The consequences 
for the field of asymptotic laws are moved toward the end of the paper. 

Introduction 

The phenomenon known as the law of series appears in many aspects of every- 
day life, science and technology. In the common sense understanding it signifies 
a sudden increase of frequency of a rare event, seemingly violating the rules of 
probability. Let us quote from Jean Moisset ([Mo]): 

This law can be defined as: the repetition of identical or analogous events, things or 
symbols in space or time; for example: the announcement of several similar accidents 
on the same day, a series of strange events experienced by someone on the same day 
which are either happy ones (a period of good luck) or unfavorable (disastrous) ones, 
or the repetition of unexpected similar events. For example, you are invited to dinner 
and you are served a roast beef and you note that you were served the same menu 
the day before at your uncle's home and the day before that at your cousin's home. 
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Another proverb describing more or less the same (with regard to unwanted events) 
is misfortune seldom comes alone. Both expressions exist in many languages, prov- 
ing that the phenomenon has been commonly noticed throughout the world. In 
this setting it has been accounted to the category of unexplained mystery, para- 
physics, parapsychology, together with "malignancy of fate", "Murphy laws", etc. 
Many pseudoscicntific experiments have been conducted to prove obvious violation 
of statistical laws, where such laws were usually identified with the statistics of a 
series of independent trials ([St], [Km]). Equally many texts have been devoted 
to explain the anomaly within the framework of an independent process (see e.g., 
[Mi]), or merely as the weakness of our memory, keen to notice unusual events as 
more frequent just because they are more distinctive. 

The phenomenon is also known in more serious science. In modeling and sta- 
tistics it is sometimes called "clustering of data". It is experimentally observed 
in many real processes, such as traffic jams, telecommunication network overloads, 
power consumption peaks, demographic peaks, stock market fluctuations, etc., as 
periods of increased frequency of occurrences of certain rare events. These anom- 
alies are usually explained in terms of physical dependence (periods of propitious 
conditions) and complicated algorithms are implemented in modeling these pro- 
cesses to simulate them. 

But, to our knowledge, there was no logical construction proving, in full gener- 
ality, that there exists a "natural" tendency of rare events to appear in series, and 
the result we will present in this paper, or any of its possible variants, remained 
until now unnoticed by the specialists. We prove an ergodic-theoretic theorem on 
stationary stochastic processes, in which a wide range of "rare events" is shown to 
behave either in a way which we call "unbiased" (i.e., as in an independent pro- 
cess), or else exactly as it is specified in the law of series, i.e., so that the first 
occurrence increases the chances of untimely repetitions. Roughly speaking, we 
prove that rare events appear in series whenever the unbiased behavior 
is perturbed: there is no other choice. Besides ergodicity (which is automat- 
ically satisfied if we observe a chosen at random single realization of any process) 
we make only one essential, but obviously necessary, assumption on the process: it 
must maintain a "touch of randomness", i.e., the future must not be completely 
determined by the past, which is equivalent to assuming positive entropy. With- 
out this assumption a rotation of a compact group is an immediate example where 
events never appear in scries. Of course, not every interesting rare event in reality 
can be modeled by the type of set we describe (cylinder over a long block), and we 
do not claim that our theorem fully explains the common sense phenomenon, but 
it certainly sheds on it some new light. 

In terms of crgodic theory, we define two elementary antagonistic properties of 
the return times called "attracting" and "repelling" , and wc prove that they behave 
quite differently in processes of zero and of positive entropy: attracting can persist 
for arbitrarily long blocks in both cases, while repelling must decay (as the length 
of blocks grows to infinity) in positive entropy processes. Many properties are 
known to differentiate between positive and zero entropy, but most of them involve 
a passage via measure-theoretic isomorphism, i.e., change of a generator, or require 
some additional structure. Our "decay of repelling" holds in general and for any 
finite generator, or even partition, as long as it generates positive entropy. 

It is impossible not to mention here the theorem of Ornstein and Weiss [0-W2] 
which relates the return times of long blocks to entropy. However, this theorem says 
nothing about attracting or repelling, because the limit appearing in the statement 
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is insensitive to the proportions between gap sizes. Nevertheless, this remarkable 
result is very useful and it will help also in our proof. Our theorem's proof is entirely 
contained within the classics of ergodic theory; it relies on basic facts on entropy for 
partitions and sigma-fields, some elements of the Ornstein theory (e- independence) , 
the Shannon-McMillan-Brciman Theorem, the Ornstein- Weiss Theorem on return 
times, the Ergodic Theorem, basics of probability and calculus. We do not invoke 
any specialized machinery of stochastic processes or statistics. 

The authors would like to thank Dan Rudolph for a hint leading to the con- 
struction of Example 2 and, in effect, to the discovery of the attracting/repelling 
asymmetry. We also thank Jean-Paul Thouvenot for his interest in the subject, 
substantial help, and the challenge to find a purely combinatorial proof (which we 
save for the future). 

Rigorous definition and statement 

We establish the notation necessary to formulate the main result. Let (P z , [J,, a) 
be an ergodic process on finitely many symbols, i.e., #P < oo, a is the standard 
left shift map and fj, is an ergodic shift-invariant probability measure on V z . Most 
of the time, we will identify finite blocks with their cylinder sets, i.e., we agree that 
V n = \/™= &~ Z (P)- Depending on the context, a block B e V n is attached to 
some coordinates or it represents a "word" which may appear in different places 
along the P-names. We will also use the probabilistic language of random variables. 
Then ^{R e A} (A C M) will abbreviate fi({x € T z : R(x) e A}). Recall, that 
if the random variable R is nonnegative and F(t) — n{R < t} is its distribution 
function, then the expected value of R equals J Q 1 — F(t) dt. 

For a set B of positive measure let Rb and Rb denote the random variables 
defined on B (with the conditional measure ^b — jJlTj) as tne absolute and nor- 
malized first return time to B, respectively, i.e., 

R B {y) =min{« >0,a\y) £ B}, R B (y) = n(B)R B (y). 

Notice that, by the Kac Theorem ([Kc]), the expected value of Rb equals j^b), 

hence that of R B is 1 (that is why we call it "normalized"). We also define 

G B {t)= f l-F B (s)ds. 
Jo 

(The interpretation of this function is discussed in the following section.) Clearly, 
G_b(<) < min{t, 1} and the equality holds when Fs(t) — 1[i j0 o)j that is, when B 
occurs precisely with equal gaps, i.e., periodically; the gap size then equals j^b}- 
The key notions of this work are defined below: 

Definition 1. We say that the visits to B repel (resp. attract) each other with 
intensity e from a distance t > 0, if 

G B {t) > 1 - e~* + e (resp. if G B {t) < 1 - e"* - e). 

We abbreviate that B repels {attracts) with intensity e if its visits repel (attract) 
each other with intensity e from some distance t. 

Obviously, occurrences of an event may simultaneously repel from one distance 
and attract from another. Notice, that the maximal intensity of repelling is e _1 
achieved at t = 1 when B appears periodically. The intensity of attracting can 
be arbitrarily close to 1 (when B appears in enormous clusters separated by huge 
pauses; see the next section). The main result follows: 
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Theorem 1. // ("P z , /i, a) is ergodic and has positive entropy, then for every e > 
the measure of the union of all n-blocks B G V n which repel with intensity e, 
converges to zero as n grows to infinity. 

We also provide an example (Example 2) in which, for a substantial collection of 
lengths, the majority of cylinders display strong attracting. Moreover, the process 
of Example 2 is isomorphic to a Bernoulli process, which implies that a partition 
with such strong attracting properties can be found in any measure-preserving 
transformation of positive entropy (see the Remark 3). Let us mention that it is easy 
to find zero entropy examples with cither persistent repelling (discrete spectrum) 
or attracting (see the Example 3), or even both at a time (see the Remark 4). 

Interpretation and its limits 

Let us elaborate a bit on the meaning of attracting and repelling for an event 
B. Let Vb be the random variable defined on X as the hitting time statistic, i.e., 
the waiting time for the first visit in B (the defining formula is the same as for Rb, 
but this time it is regarded on X with the measure jj,). Further, let Vb = ^{B)V B , 
called, by analogy, the normalized hitting time (although the expected value of this 
variable need not be equal to 1). By ergodicity, Vb and Vb are well defined. By 
an elementary consideration of the skyscraper above B, one easily verifies, that the 
distribution function Fb oi Vb satisfies the inequalities: 

G B (t)-fi(B)<F B (t)<G B (t) 

(see [H-L-V] for more details). Because we deal with long blocks (so that, by the 
Shannon-McMillan-Breiman Theorem, /i(B) is, with high probability, very small), 
for sake of the interpretation, we will simply assume that Fb — Gb- Thus, attract- 
ing and repelling can be considered properties of the hitting rather than return time 
statistic. In fact, if we replace Gb by F B in the definition of attracting/repelling, 
the formulation of Theorem 1 remains exactly the same, because it admits tolerance 
up to a fixed e. 

It is easy to see that if ("P z ,/U, a) is an independent Bernoulli process, then, for 
any long block B, F B (t) w 1 — e~* (and also Fs(t) « 1 — e~*) with high uniform 
accuracy. We will call such behavior "unbiased" (neither attracting nor repelling), 
and attracting and repelling can be viewed as deviations from the unbiased pattern. 

Fix some t > 0. Consider the random variable / counting the number of 
occurrences of B in the time period [0, jrm ]- The expected value of / equals 
fi(B) I ^b) -I ~ * ( U P *° the ignorable error n{B)). On the other hand, > 0} = 
^{Vb < j^b)} = ^b(*)- Attracting from the distance t occurs when the last value is 
smaller than it would be (say, for the same cylinder B) in an independent process. 
Because the expected value of / in the independent process is maintained (and 
equals approximately t), the conditional expected value of / on the set {/ > 0} 
must be larger in i(P' L ,\i,a) than in the independent process. This fact can be in- 
terpreted as follows: If we observe the process for time jj^b) (which is our "memory 
length" or "lifetime of the observer" ) and we happen to see the event B during this 
time at least once, then the expected number of times we will observe the event B 
is larger than the analogous value in the independent process. The first occurrence 
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of B "attracts" further repetitions (misfortune seldom comes alone). 

repelling B B B...B B B B....B B....B..B ....B B.. 

unbiased B B....B..B....B B B..B B..B.B B B.. 

attracting B B..B.B..B B BB_ B.BB B B.. 

strong attr BBB.BB B BBB.BB.BB 



Figure 1: Comparison between unbiased, repelling and attracting distributions of copies of a block. 
Attracting with intensity close to 1 occurs, when Gb is very "flat" (close to zero on a long initial 
interval). Then Fb is immediately very close to 1 indicating that on most of B the first return 
time is much smaller than ^jjy • Of course, this must be compensated on a small part of B by 
extremely large values of the return time. This means that the visits to B occur in enormous 
clusters of very high frequency, compensated by huge pauses with no (or very few) visits. Such 
pattern will be called "strong attracting" and it will take place in some of our examples. 

Repelling from the distance t means exactly the opposite: The first occurrence 
lowers the expected number of repetitions within the observation period, i.e., repels 
them. If we have a mixed behavior, our impression about whether the event attracts 
or repels its repetitions depends on the length of our "memory". Attracting not 
assisted by repelling (or assisted by repelling of an ignorably small intensity) means 
that no matter what memory length we apply, cither we sec a nearly unbiased 
behavior or the first occurrence visibly attracts further repetitions. Our Theorem 1 
asserts that if we observe longer and longer blocks B, repelling from any distance 
must decay in both measure and intensity (while attracting can persist), so that 
for majority of long blocks we will see the behavior as described above. 

We also note, that by pushing the graph of Fb downward (compared to 1 — e - *), 
attracting contributes to increasing the expected value of the associated random 
variable, i.e., of the hitting time. In case of attracting assisted by only very small 
intensity repelling, the average waiting time for the first occurrence of the event B 
is increased in comparison to unbiased (may even not exist). Thus, instinctively 
judging the probability of the event by (the inverse of) the waiting time for the 
first occurrence we will typically underestimate it. All the more we are surprised, 
when the following occurrences happen after a considerably shorter time. This 
additionally strengthens the phenomenon's appearance. 

Another consequence of attracting not assisted by repelling (or assisted by re- 
pelling of a very small intensity) is an increased variance of the return time statistic 
(the variance may even cease to exist). Thus, again, the gaps between the occur- 
rences of B are driven away from the expected value, toward the extremities and 
oo, and hence, into the pattern of clusters separated by longer pauses. We skip the 
elementary estimations of the variance. 

It must be reminded: we do not claim, for any class of processes, that occurrences 
of long blocks will actually deviate from unbiased. There are conditions, weaker 
than full independence, under which the distributions of the normalized return times 
of long blocks converge almost surely to the exponential law. It is so, for instance, 
in Markov processes (with finite memory). In fact, such convergence is implied 
by a sufficient rate of mixing ([A-G], [H-S-V]). Yet, such processes seem to be 
somewhat exceptional and we expect that attracting rules in majority of processes 
(see the Question 5 at the end of the paper). As we have already mentioned, at 
least that much is true, that in any dynamical system with positive entropy there 
exist partitions with strong attracting properties. 
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It is important not to be misled by an oversimplified approach. The "decay of 
repelling" in positive entropy processes appears to agree with the intuitive under- 
standing of entropy as chaos: repelling is a "self-organizing" property; it leads to 
a more uniform, hence less chaotic, distribution of an event along a typical orbit. 
Thus one might expect that repelling with intensity e revealed by a fraction £ of all 
n-blocks contributes to lowering an upper estimate of the entropy by some percent- 
age proportional to £ and depending increasingly on e. If this happens for infinitely 
many lengths n with the same parameters £ and e, the entropy should be driven 
to zero by a geometric progression. Surprisingly it is not quite so, and the phe- 
nomenon has more subtle grounds. We will present an example which exhibits the 
incorrectness of such intuition (see the Example 1 and the preceding discussion). 
Also, it will become obvious from the proof, that there is no gradual reduction of 
the entropy. The entropy is "killed completely in one step", that means, positive 
entropy and persistent repelling lead to a contradiction by examining the blocks 
of one sufficiently large length n; we do not use any iterated procedure requiring 
repelling for infinitely many lengths. 

Notation and preliminary facts 

We now establish further notation and preliminaries needed in the proof. If 
AcZ then we will write V A to denote the partition or sigma-field \J ieA o~ % (J > }. 
We will abbreviate V n = ,V~ n = pi-"- 1 !, V~ = - 1 ! ( a "finite future" , 

a "finite past", and the "full past" of the process). 

We assume familiarity of the reader with the basics of entropy for finite partitions 
and sigma-fields in a standard probability space. Our notation is compatible with 
[P] and we refer the reader to this book, as well as [Sh] and [W], for background 
and proofs. In particular, we will be using the following: 

* The entropy of a partition equals H(P) = — Y^Aev M-^) l°g2(/ i (^))- 

* For two finite partitions V and B, the conditional entropy H(V\B) is equal 
to ^2 Be i$ h(B)Hb(P), where Hb is the entropy evaluated for the conditional 
measure (i B on B. 

* The same formula holds for conditional entropy given a sub-sigma-field C, i.e., 

]T n(B)H B (V\C) = H(V\BV C). 

BeB 

* The entropy of the process is given by any one of the formulas below 

h = H{V\V~) = \H(V r \V~) = lim }H{V r ). 

r— >oo 

We will exploit the notion of e-independence for partitions and sigma-fields. The 
definition below is an adaptation from [Sh] , where it concerns finite partitions only. 
See also [Sm] for treatment of countable partitions. Because "e" is reserved for the 
intensity of repelling, we will speak about /3-independence. 

Definition 2. Fix (i > 0. A partition V is said to be @ -independent of a sigma-field 
B if for any immeasurable countable partition B' holds 

]T \fi(A n .B) - fj,(A)fi(B)\ < /3. 

A£V,B£B' 
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A process (P z , fx, a) is called an (3 -independent process if V is /3-independent of the 
past V~ . 

A partition V is independent of another partition or a sigma-ficld B if and only 
if H(V\B) = H{V). The following approximate version of this fact holds (see 
[Sh, Lemma 7.3] for finite partitions, from which the case of a sigma-field is easily 
derived) . 

Fact 1. A partition V is (3 ' -independent of another partition or a sigma-field B if 
H(V\B) > H(V) - £, for £ sufficiently small. □ 

In course of the proof, a certain lengthy condition will be in frequent use. Let 
us introduce an abbreviation: 

Definition 3. Given a partition V of a space with a probability measure fi and 
S > 0, we will say that a property &(A) holds for A e V with [i-tolerance S if 

n(\J{AGV: >l-a. 



We shall also need an elementary estimate, whose proof is an easy exercise. 

Fact 2. For each AeV, H(T) < (1 - n{A)) log 2 (#"P) + 1. □ 

In addition to the random variables of absolute and normalized return times Rb 
and Rb, we will also use the analogous notions of the k th absolute return time 

= mm{i :#{0<j<i: a j (y) e B} = k}, 
and of the normalized fc th return time R^g = [i(B)R^ (both defined on B), with 

(k) 

F B always denoting the distribution function of the latter. Clearly, the expected 
value of R B equals k. 

The idea of the proof and the basic lemma 

Before we pass to the formal proof of Theorem 1, we would like to have the reader 
oriented in the mainframe of the idea behind it. We intend to estimate (from above, 
by 1 - e"* + e) the function G B a, for long blocks of the form BA e V [ ^ n ' r) . The 
"positive" part A has a fixed length r, while we allow the "negative" part B to be 
arbitrarily long. There are two key ingredients leading to the estimation. The first 
one, contained in Lemma 3, is the observation that for a fixed typical B £ V~ n , the 
part of the process induced on B (with the conditional measure hb) generated by 
the partition V r , is not only a /3-independent process, but it is also /3-independent 
of many returns times R^ of the cylinder B (see the Figure 2). 



coordinate 



Figure 2: The process . . . A-1A0A1A2 ■ ■ ■ of r -blocks following the copies of B is a (3 -independent 
process with additional (3 -independence properties from the positioning of the copies of B. 
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This allows us to decompose (with high accuracy) the distribution function Fba of 
the normalized return time of BA as follows: 

F ba (t) = Hba{Rba < t} = hba{Rba < -pjmj} = 

E VBA{RT = k, 4 fc) < ^} « E Mba{4 B) - *} ' < *} « 

fc>l fe>l 

Emi-p)'- 1 -^^ 

fc>l 

where i?^ ; denotes the first (absolute) return time of A in the process induced on 
B, and p = ^i B (A). 

The second key observation is, assuming for simplicity full independence, that 
when trying to model some repelling for the blocks BA, we ascertain that it is 
largest, when the occurrences of B are purely periodic. Any deviation from period- 
icity of the B's may only lead to increasing the intensity of attracting between the 
copies of BA, never that of repelling. We will explain this phenomenon more for- 
mally in a moment. Now, if B does appear periodically, then the normalized return 
time of BA is governed by the same geometric distribution as the normalized return 
time of A in the independent process induced on B. If p is small, this geometric 
distribution function becomes nearly the unbiased exponential law 1 — e~*. The 
smallness of p is a priori regulated by the choice of the parameter r (Lemma 1). 

The phenomena that, assuming full independence, the repelling of BA is maxi- 
mized by periodic occurrences of B, and that even then there is nearly no repelling, 
is captured by the following elementary lemma, which will be also useful later, near 
the end of the rigorous proof. 

Lemma 0. Fix some p G (0,1). Let (k > 1) be a sequence of distribution 

functions on [0, oo) such that the expected value of the distribution associated to 
F (k ^ equals k. Define 

F(t)=Ep(l-p) fe " 1 ^ (fe) (p ; and G(t)= [\-F(s)ds. 
k>i Ja 

Then G(t) < ^^(1 - e~*), where e p = (1 - p)~p . 
Proof. We have 

G(t)=Ep(l-P) fe - 1 f l-F^X^ds. 
k>i J ° 

We know that F^(t) e [0, 1] and that / °° 1 - F<- k \s)ds = k (the expected value). 
With such constraints, it is the indicator function l[fc i0 o) that maximizes the inte- 
grals from to t simultaneously for every t (because the "mass" k above the graph 
is for such choice of the function swept maximally to the left) . The rest follows by 
direct calculations: 

G(t) < E^(l -V?- 1 / Wf)d* - / E = 
k>i Ja Ja fe=Tfl 

/V-p)w«to< (1 ~ p) *~ 1 . □ 

Jo log(l — p) p 
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Recall that the maximizing distribution functions F B = lrfc j00 ) occur, for the 
normalized return time of a set B, precisely when B is visited periodically. This 
explains our former statement on this subject. 

Let us comment a bit more on the first key ingredient, the /^-independence. Es- 
tablishing it is the most complicated part of the argument. The idea is to prove 
conditional (given a "finite past" V~ n ) /3-independence of the "present" V r from 
jointly the full past and a large part of the future, responsible for the return times 
of majority of the blocks B e V~ n . But the future part must not be too large. 
Let us mention the existence of "bilaterally deterministic" processes with positive 
entropy (first discovered by Gurevic [G], see also [O-Wl]), in which the sigma-fields 
generated by the coordinates (— oo,— to] U [to, oo) do not decrease with to to the 
Pinsker factor; they are all equal to the entire sigma-field. (Coincidcntly, our Ex- 
ample 1 has precisely this property; see the Remark 2.) Thus, in order to maintain 
any trace of independence of the "present" from our sigma-field already containing 
the entire past, its part in the future must be selected with an extreme care. Let us 
also remark that an attempt to save on the future sigma-fields by adjusting them 
individually to each block B n e V~ n falls short, mainly because of the "off diagonal 
effect"; suppose V r is conditionally (given V~ n ) nearly independent of a sigma-field 
which determines the return times of only one selected block B & V~ n . The in- 
dependence still holds conditionally given any cylinder B e V~ n from a collection 
of a large measure, but unfortunately, this collection can always miss the selected 
cylinder Bq. In Lemmas 2 and 3, we succeed in finding a sigma-field (containing the 
full past and a part of the future), of which V r is conditionally /3-independent, and 
which "nearly determines", for majority of blocks B E V~ n , some finite number 
of their sequential return times (probably not all of them). This finite number is 
sufficient to allow the described earlier decomposition of the distribution function 
F BA - 

The proof 

Throughout the sequel we assume ergodicity and that the entropy h of (P 2 , /i, a) 
is positive. We begin our computations with an auxiliary lemma allowing us to 
assume (by replacing V by some V r ) that the elements of the "present" partition 
are small, relatively in most of B G V n and for every n. Note that the Shannon- 
McMillan-Brciman Theorem is insufficient: for the conditional measure the error 
term depends increasingly on n, which we do not fix. 

Lemma 1. For each S there exists anreN such that for every n € N the following 
holds for B e V~ n with \i-tolerance S: 

for every A e V r , Hb(A) < S. 



Proof. Let a be so small that 



Ja < o and — > 1 — -, 

h + a ~ 2 



and set 7 = log °#-p) - Let r be so big that 



1 <r 1 <r 6 

r r(h + a) 2 
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and that there exists a collection V r of no more than 2 r (' l+0! ) - 1 elements of V r 
whose joint measure ft exceeds 1— 7 (by the Shannon-McMillan-Breiman Theorem). 

Let V r denote the partition into the elements of V r and the complement of 
their union, and let 1Z be the partition into the remaining elements of V r and the 
complement of their union, so that V r = V r V 1Z. For any n we have 

rh = H{V r \V~) < H{V r \V~ n ) = H(V r V n\V~ n ) = 
H(V r \TZ V V~ n ) + H{Tl\V- n ) < H{V r \V- n ) + H(TZ) < 

Bev~ n 

(we have used Fact 2 for the last passage). After dividing by r, we obtain 
^(B)^H B (Vr) > h - 7 log 2 (#P) - I > h - 2a. 

Because each term ^H B (V r ) is not larger than i \og 2 (#V r ) which was set to be at 
most h + a, we deduce that 

holds for B e V~ n with /i-tolerancc ^a, hence also with ^-tolerance 5. On the 
other hand, by Fact 2, for any B and A e V r , holds: 

H B (P r ) < (1 - fi B (A)) log 2 (#^) + 1 < (1 - li B {A))r(h + a) + l. 

Combining the last two displayed inequalities we establish that, with ^-tolerance 5 
for B G "P~" and then for every A G V r , holds 

1 - fJ, B {A) > —— * — > 1 -5. 

h + a r(h + a) 

So, /J. B (A) < S. Because V r refines 'P r , the elements of V r are not larger. □ 

We continue the proof with a lemma which can be deduced from [Rl, Lemma 3]. 
We provide a direct proof. For a > and M G N let 

S(M, a) = \J [mM + aM, (to + 1)M - aM) n Z. 

Lemma 2. For fixed a and r there exists M such that for every M > M holds, 

H{V r \V~ \ZV s{M ' a) ) >rh-a 

(see the Figure 3). 



Figure 3. The circles indicate the coordinates through r — 1, the conditioning sigma-filed is over 
the coordinates marked by stars, which includes the entire past and part of the future with gaps 
of size 2aM repeated periodically with period M (the first gap is half the size). 
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Proof. First assume that r = 1. Denote also 

S'(M, a) = (J [ml + aM, (m + 1)M) n Z. 

mGZ 

Let M be so large that H(V^- a)M ) < (1 - a)M(ft + 7), where 7 = 2(f=SJ- Thcn ' 
for any m > 1, 

ff(p S'(M,a)n[0,mM)j ? ,- ) < ff(p S'(M, a )n[0,mM) ) < ^ _ a ) mM ( ft + 7 ). 

Because ^("pl '" 1 *^"/ 7- ) = mMh, the complementary part of entropy must exceed 
mMh — (1 — a)mM{h + 7) (which equals amM(h — ^)), i.e., we have 

jy^[0,mM)\S'(M,a)|^,- v pS'(lf,«)n[0,mM)j > am M(h - §). 

Breaking the last entropy term as a sum over j G [0,mM) \ S'{M,a) of the con- 
ditional entropies of a^ J (V) given the sigma-field over all coordinates left of j and 
all coordinates from S'{M, a) n [0, mM) right of j, and because every such term is 
at most h, we deduce that more than half of these terms reach or exceed h — a. 
So, a term not smaller than h — a occurs for a j within one of the gaps in the left 
half of [0,mM). Shifting by j, we obtain H{V\P~ V ^(pS'l^^nln,^ ))) >h-a, 
where i £ [0, aM) denotes the relative position of j in the gap. As we increase m, 
one value i will repeat in this role along a subsequence ml '. The operation V is con- 
tinuous for increasing sequences of sigma-fields, hence V~ V a l (P s ( M > a ) n [°> 1 T — )) 
converges over m! to V~ V a 1 (V s ( M ' a )). The entropy is continuous for such pas- 
sage, hence H(V\V~ V a l {P s ( M >")) > h — a. The assertion now follows because 
S(M, a) is contained in S'(M, a) shifted to the left by any i e [0, aM). 

Finally, if r > 1, we can simply argue for V r replacing V . This will impose 
that Mo and M are divisible by r, but it is not hard to see that for large M the 
argument works without divisibility at a cost of a slight adjustment of a. □ 

For a long block B e V~ n let (("P]j) z , us, o\b) denote the process induced on B 
generated by the restriction V B of V r to B (as is the first return time map on B). 
The following lemma is the crucial item in our argument. 

Lemma 3. For every [3 > 0, r G N and K € N £/iere exists n such that for every 
n > no, with [i-tolerance (3 for B e V~ n , with respect to hb, V r is (3 '-independent 
of jointly the past V~ and the first K return times to B, (k e [l,if]J. In 

particular, (('Pb) 2 , Ms, <tb) is a (3 -independent process. 

Proof. We choose £ according to Fact 1, so that ^-independence is implied. Let a 
satisfy 

Let no be so large that H(V r \V~ n ) < rh + a for every n > no and that for every 
ke [1, if] with ^-tolerance a for S e V~ n holds 

^ s | 2 n(h-a) < ^(fe) < 2 n(h+a)| > j _ Q 

(we are using Ornstein- Weiss Theorem [0-W2] ; the multiplication by k is consumed 
by a in the exponent). Let Mo > 2 n °C l ~ a ' be so large that the assertion of Lemma 2 
holds for a, r and M , and that for every M > M , 

(M + l) I+ ^<aM 2 and < a. 
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We can now redefine (enlarge) no and Mq so that Mo = \2 n °^ h ~ a ^ j. Similarly, for 
each n > n we set M n = [2™(' l ~ a )j . Observe, that the interval where the first 
K returns of most n-blocks B may occur (up to probability a), is contained in 
[M ni aMl] (because 2"( /l+tt ) < (M„ + < aM„ 2 ). 

At this point we fix some n > uq. The idea is to carefully select an M between 
M n and 2M n (hence not smaller than Mo), such that the initial K returns of nearly 
every n-block happen most likely inside (with all its n symbols) the set S(M,a), 
so that they are "controlled" by the sigma-field 'P s ( M > a \ Let a' = a+ so that 
every n-block overlapping with S(M, a') is completely covered by S(M, a). By the 
second assumption on M > M and by the formula connecting M n and n, we have 
a' < 2a. To define M we will invoke the triple Fubini Theorem. Fix k £ [1, K] and 
consider the probability space 

V~ n x [M„,2M„] x N 

equipped with the (discrete) measure M whose marginal onP - " x [M„,2M„] is 
the product of fx (more precisely, of its projection onto V~ n ) with the uniform 
distribution on the integers in [M„, 2M„], while, for fixed B and M, the measure 

(k) 

on the corresponding N-section is the distribution of the random variable R y B ' . In 
this space let S be the set whose N-section for a fixed M (and any fixed B) is 
the set S(M,a'). We claim that for every I <E [M n ,aM%] n N (and any fixed B) 
the [M„, 2M„]-section of S has measure exceeding 1 — 16a. This is quite obvious 
(even for every I £ [M„,oo) and with 1 — 15a) if [M„,2M„] is equipped with the 
normalized Lebesgue measure (see the Figure 4). 




Figure 4: The complement of S splits into thin skew strips shown in the picture. The nor- 
malized Lebesgue measure of any vertical section of the j th strip (starting at jM n with j > 1) 
is at most -^Jri < ~p < ~p ■ Each vertical line at I > M„ intersects strips with indices 
h j + 1) 3 + 2 up to at most 2j (for some j), so the joint measure of the complement of the section 
of S does not exceed 15a. 



s 

/ i \ 

M„ 2M„ 

Figure 5: The discretization replaces the Lebesgue measure by the uniform measure on M n in- 
tegers, thus the measure of any interval can deviate from its Lebesgue measure by at most -jg— . 
For I < aM% the corresponding section of S (in this picture drawn horizontally) consists of at 
most aMn intervals, so its measure can deviate by no more than a. 
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In the discrete case, however, a priori it might happen that the integers along 
some [M n , 2M„]-section often "miss" the section of S leading to a decreased measure 
value. (For example, it is easy to see that for I = (2M„)! the measure of the section 
of S is zero.) But because we restrict to I < aM%, the discretization does not affect 
the measure of the section of S by more than a, and the estimate with 1 — 16a 
holds (see the Figure 5 above). 

Taking into account all other inaccuracies (the smaller than a part of S outside 
[M n , aM%] and the smaller than a part of S projecting onto blocks B which do not 
obey the Ornstein- Weiss return time estimate) it is safe to claim that 

M(S) > 1 - 18a. 

This implies that for every M from a set of measure at least 1 — the measure 

of the (V~ n x N)-section of S is larger than or equal to 1 — ^fa. For every such 
M, with /i-tolerance tfa for B e V~ n , the probability hb that the k th repetition 
of B falls in S(M, a') (hence with all its n terms inside the set S{M 1 a)) is at least 
1 - tfa. 

Because 18Ky/a < 1, there exists at least one M for which the above holds for 
every k G [1, K]. This is our final choice of M. For this M, with zx-tolerance K \fa, 
all considered K returns of B are, with probability \ -\fa. (each), determined by the 
sigma-field ~p s ( M ^ a ) . More precisely, for each k € [1, K\ there is a set Uk of measure 
Hb at most yfa such that the sets {R^ = i}\Uk agree with some \ Uk, 
where each is p s ( M > "^-measurable. Thus, we can modify the variable so 
it is "P s ( M '")-measurable and equal to the original except on Uk- We denote such 
a modification by R^ . 

Let us go back to our entropy estimates. We have, by Lemma 2, 

53 KB)H B (V r \V- V V s{M > a) ) = H(V r \V~ n V V V s{M ' a) ) = 
Bev-" 

H(T r \T- V V s{M ' a) ) >rh-a> H{V r \V~ n ) -2a = 
]T n(B)H B (V r )-2a. 

Because H B {V r \P- V V s{M ' a) ) < H B {V r ) for every B, we deduce that with (in- 
tolerance \pl~a for B € J>~ n must hold 

H B (V r \V- V V s{M ' a) ) > H B (V r ) -V2^> H B (V r ) - £. 

Combining this with the preceding arguments, with /i-tolerance K ^/a+^/2a < (3 for 
B e •p - ™ both the above entropy inequality holds, and we have the V s ( M,a ^-me&- 
surable modifications R^ of the return times. By the choice of £, we obtain that 
with respect to /xg, V r is jointly ^-independent of the past and the modified return 
times R 1 ^ (k e [1,-K]). Because M(Ufee[i k] ^k) - K-^fa < §, this clearly implies 
/3-independence if each R^ is replaced by R}^ . □ 

To complete the proof of Theorem 1 it now remains to put the items together. 
Proof of Theorem 1. Fix an e > 0. On [0,oo), the functions 



g p (t)=min{l, T ^(l-e- t )+ P t}, 
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where e p = (1 — p) *> , decrease uniformly to 1 — e ' as p — > + . So, let 5 be such 
that gs(t) < 1 — e _t + e for every t. We also assume that 

(1 - 26){l -6)>l-e. 

Let r be specified by Lemma 1, so that hb{A) < 5 for every n > 1, every ieP r 
and for i? € 7?-™ w ith u-tolerance S. On the other hand, once r is fixed, the 
partition V r has at most (#'P) r elements, so with /x^-tolerance 5 for A e "P r , 
Hb(A) > <5(#"P)~ r ■ Let Ab be the subfamily of V r (depending on B) where this 
inequality holds. Let K be so large that for any p > 5(#T')~ r , 

00 

k=K+l 

and choose (3 < S so small that 

(K 2 + K + l)(3 < f. 

The application of Lemma 3 now provides an no such that for any n > hq, with 
u-tolerance (3 for B e "P~™, the process induced on B generated by V r has the 
desired /3-independence properties involving the initial K return times of B. So, 
with tolerance 5 + (3 < 25 we have both, the above /3-independence and the estimate 
Hb{A) < 5 for every A e V r . Let B n be the subfamily of V~ n where these two 
conditions hold. Fix some n > no. 

Let us consider a cylinder set B n A e ■p[-™> r ) (or, equivalently, the block £!A), 
where £> G B n , A e ^4^. The length of BA is n + r, which represents an arbitrary 
integer larger than no + r. Notice that the family of such sets BA covers more than 
(1 - 25)(1 - 5) > 1 - e of the space. 

We will examine the distribution of the normalized first return time for BA. In 

( B) 

addition to our customary notations of return times, let R A be the first (absolute) 
return time of A in {{Vg) 7 ', p,B, &b), i-e., the variable defined on BA, counting the 
number of visits to B until the first return to BA. Let p = hb(A). We have 



FBA(t) — P-Ba{RbA <t} = Hba{RbA < fj,(BA) } = 

£^{4 B) = *,4 fe) <^B)}- 
fe>i 

The k th term of this sum equals 

lM{A k = A}n {A k ^ ? A} n • • • n {A 1 ? A} n {A Q = A} n {R^ < ^}), 

where Ai is the r-block following the i th copy of B (the counting starts from at 
the copy of B positioned at [— n, — 1]). 

By Lemma 3, for k < K, in this intersection of sets each term is /3-independcnt 
of the intersection right from it. So, proceeding from the left, we can replace the 
probabilities of the intersections by products of probabilities, allowing an error of 

(3. Note that the last term equals < i} = Fg \^). Jointly, the inaccuracy 

will not exceed (K + 1)(3: 



Pba{R { a ] =k,R {k) < -t-i-^i -^-ivWfV 



}-p(l-p)*-^(i) <{K + l)(3. 
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Similarly, we also have /iba{R^ = ^} — P(l — P) k 1 



series pba{Ra B ^ = k} above K is smaller than K 2 (3 plus the tail of the geometric 
series p(l— p) fe_1 , which, by the fact that p > <5(#"P) _r , is smaller than |. Therefore 

F BA {t)^Y,^ l -P)^ lF B\^ 

k>l 

up to (K 2 + K + 1)(3+ 1 < 5, uniformly for every t. By the application of Lemma 0, 
Gba satisfies 

G BA (t) < min{l, - e"*) + <ft} < g 5 (t) < 1 - e* + e 

(because p < <5) . We have proved that for our choice of e and an arbitrary length 
m> no + r, with /x-tolerance e for the cylinders C <G "P m , the intensity of repelling 
between visits to C is at most e. This concludes the proof of Theorem 1. □ 

Consequences for limit laws 

The studies of limit laws for return/hitting time statistics are based on the 
following approach: For x e V 1 define F Xjn — Fb (and F XiTl — Fb), where B is 
the block x[0, n) (or the cylinder in V n containing x). Because for nondecreasing 
functions F : [0, oo) — > [0, 1], the weak convergence coincides with the convergence 
at continuity points, and it makes the space of such functions metric and compact, 
for every x there exists a well defined collection of limit distributions for F x>n (and 
for F XtTl ) as n — > oo. They are called limit laws for the return (hitting) times at 
x. Due to the integral relation (Fb ~ Gb) a sequence of return time distributions 
converges weakly if and only if the corresponding hitting time distributions converge 
pointwise (see [H-L-V]), so the limit laws for the return times completely determine 
those for hitting times and vice versa. A limit law is essential if it appears along 
some subsequence (nk) for x's in a set of positive measure. In particular, the 
strongest situation occurs when there exists an almost sure limit law along the full 
sequence (n). Most of the results concerning the limit laws, obtained so far, can 
be classified in three major groups: a) characterizations of possible essential limit 
laws for specific zero entropy processes (e.g. [D-M], [C-K]; these limit laws are 
usually atomic for return times or piecewise linear for hitting times), b) finding 
classes of processes with an almost sure exponential limit law along (n) (e.g. [A- 
G], [H-S-V]), and c) results concerning not essential limit laws, limit laws along 
sets other than cylinders (see [L]; every probabilistic distribution with expected 
value not exceeding 1 can occur in any process as the limit law for such general 
return times), or other very specific topics. As a consequence of our Theorem 1, we 
obtain, for the first time, a serious bound on the possible essential limit laws for the 
hitting time statistics along cylinders in the general class of ergodic positive entropy 
processes. The statement (1) below is even slightly stronger, because we require, 
for a subsequence, convergence on a positive measure set, but not necessarily to a 
common limit. 

Theorem 2. Assume ergodicity and positive entropy of the process {P z 1 p,o-). 

(1) // a subsequence (nt) is such that F x ^ nk converge pointwise to some limit 
laws F x on a positive measure set A of points x, then almost surely on A, 
F x (t) < 1 - e"* at each t > 0. 

(2) If (rife) grows sufficiently fast, then there is a full measure set, such that for 
every x in this set holds: limsup fc F XtTlk (t) < 1 — e~* at each t>0. 



< Kf3, hence the tail of the 
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Proof. The implication from Theorem 1 to Theorem 2 is obvious and we leave it to 
the reader. For (2) we hint that (rife) must grow fast enough to ensure summability 
of the measures of the sets where the intensity of repelling persists. □ 

Examples 

The first construction will show that for each 5 > and n <G N there exists 
WeN and an ergodic process on N symbols with entropy log 2 N — S, such that 
the n-blocks from a collection of joint measure equal to — repel with nearly the 
maximal possible intensity e _1 . Because S can be extremely small compared to i, 
this construction illustrates, that there is no "reduction of entropy" by an amount 
proportional to the fraction of blocks which reveal strong repelling. 

Example 1. Let V be an alphabet of a large cardinality N. Divide V into two 
disjoint subsets, one, denoted Vo, of cardinality N = N2~ s and the relatively small 
(but still very large) rest which we denote by {1,2, ... ,r} (we will refer to these 
symbols as "markers"). For i — 1,2, ... ,r, let Bi be the collection of all n-blocks 
whose first n — 1 symbols belong to Vq and the terminal symbol is the marker 
i. The cardinality of Bi is Nq _1 . Let d be the collection of all blocks of length 
nNy -1 obtained as concatenations of blocks from Bi using each of them exactly 
once. The cardinality of d is (TV^ 1-1 )!. Let X be the subshift whose points are 
infinite concatenations of blocks from \Jl =1 Ci, in which every block belonging to 
Ci is followed by a block from C, + i (1 < i < r) and every block belonging to C r 
is followed by a block from C\. Let /i be the shift-invariant measure of maximal 
entropy on X. It is immediate to see that the entropy of \i is * i log 2 ((iVo _ )!), 

which, for large TV, nearly equals log 2 Nq = log 2 N — 5. Finally observe that the 
measure of each B e Bi equals nrA ^-i , the joint measure of (Ji=i &i 1S exactly 
— , and every block B from this family appears in any x <G X with gaps ranging 

l-i l+i 

between -jjrg) and ^pjy, revealing strong repelling. 

Remark 1. Viewing blocks of length nrN^ " 1 starting with a block from C\ as a 
new alphabet, and repeating the above construction inductively, we can produce 
an example (with the measure of maximal entropy on the intersection of systems 
created in consecutive steps) with entropy log 2 N — 28, in which the strong repelling 
will occur with probability for infinitely many lengths n^. 

Remark 2. The process described in the above remark is (somewhat coincidently; it 
was not designed for that) bilaterally deterministic: for every m € N the sigma-field 
-p(-oo,-m]u[m,oo) C q ua j s the full (product) sigma-field. Indeed, suppose we see all 
entries of a point x except on the interval (—to, to). In a typical point, this interval 
is contained between a pair of successive markers i for some level k of the inductive 
construction. Then, by examining this point's entries far enough to the left and 
right we will see completely all but one blocks from the family Bi which constitute 
the block C G Ci covering the considered interval. Because every block from Bi is 
used in C exactly once, by elimination, we will be able to determine the missing 
block and hence all symbols in (—to, to). 

The next construction shows that there exists a process isomorphic to a Bernoulli 
process with an almost sure limit law F = for the normalized hitting times (strong 
attracting), achieved along a subsequence of upper density 1. In particular, this 
answers in the negative a question of Zaqueu Coelho ([C]), whether all processes 
isomorphic to Bernoulli processes have necessarily the exponential limit law for the 
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hitting (and return) times. The idea of the construction was suggested to us by D. 
Rudolph ([R2]), who attributes the method to Arthur Rothstcin. 

Example 2. We will build a decreasing sequence of subshifts of finite type (SFT's). 
In each we will regard the measure of maximal entropy. Begin with the full shift 
Xq on a finite alphabet. Select r words Wi,W%, ■ ■ ■ ,W r of some length I and 
create r SFT's: X^\x { ^\ X^ r \ forbidding one of these words in each of them, 
respectively. Choose a length n so large that in the majority of blocks of this length 
in X^ all words Wj except Wi will appear at least once. Now choose another length 
m, such that in the majority of blocks of this length every block C of length between 
n and n 2 will appear many times. Now define X\ as the subshift whose each point 
is a concatenation of the form . . . B 1 B 2 ■ ■ ■ B r B\ . . . , where B t is a block appearing 
in X^ of length cither raormll. Obviously, a typical block C of any length 
between n and n 2 appearing in X\ comes from some , hence contains all Wj 's 
except Wi, therefore in a typical x £ X\, C will appear many times within each 
component Bi representing Xq\ and then it will be absent for a long time, until the 

(i) 

next representative of Xq . So, every such block will reveal strong attracting. It is 
not hard to see that X\ is a mixing SFT and its d-bar distance from the full shift is 
small whenever the length I of the (few) forbidden words Wi is large. We can now 
repeat the construction starting with X\, and radically increasing all parameters. 
We can arrange that the d-bar distances are summablc, so the limit system X (the 
intersection of the AVs), more precisely its measure of maximal entropy, is also a 
d-bar limit. Each mixing SFT is isomorphic to a Bernoulli process and this property 
passes via d-bar limits (see [O], [Sh]), hence X is also isomorphic to a Bernoulli 
process. This system has the almost sure limit law F = for hitting times (or 
F = 1 for the return times) achieved along a sequence containing infinitely many 
intervals of the form [n, n 2 }. Such sequence has upper density 1. 

Remark 3. It is also possible to construct a process Xh as above with any preas- 
signed entropy h. On the other hand, it is well known ([Si]), that every measure- 
preserving transformation with positive entropy h possesses a Bernoulli factor of 
the same entropy. By the Ornstein Theorem ([O]) this factor is isomorphic to 
Xh- The generator of Xh appears as a partition of the space on which the initial 
measure-preserving transformation is defined. This proves the universality of "the 
law of series" : in every measure-preserving transformation there exists a 
partition generating the full entropy, which has the "strong repelling 
properties" (i.e., almost sure limit law F = along a sequence of lengths of upper 
density 1). 

Various zero entropy processes with persistent repelling or attracting are implicit 
in the existing literature. Extreme repelling (with intensity converging to e _1 as 
the length of blocks grows) occurs for example in odometers, or, more generally, in 
rank one systems ([C-K]). For completeness, we sketch two zero entropy processes 
with features of positive entropy: repelling, and the unbiased behavior. 

Example 3. Take the product of the independent Bernoulli process on two symbols 
with an odometer (modeled by an adequate process, for example a regular Tocplitz 
subshift; see [D] for details on Toeplitz flows). Call this product process X . The 
odometer factor provides, for each k € N, markers dividing each element into so- 
called k-blocks of equal lengths pu- Eachp^ is a multiple of pk-i and each fc-block is 
a concatenation of (k— l)-blocks. Now we create a new process X\ by "stuttering": 
if x G X n is a concatenation . . . ABCD ... of 1-blocks, we create x\ G X\ as 
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. . . AABBCCDD . . . , with the number of repetitions qi = 2. In X\ the lengths of 
the fc-blocks for k > 1 have doubled. Repeating the stuttering for 2-blocks of X\ 
with a number of repetitions q 2 > 2, we obtain a process X 2 - And so on. Because 
in each step we reduce the entropy by at least half, the limit process has entropy 
zero. If the q^s grow sufficiently fast, we obtain, like in the previous example, a 
system with strong attracting for a set of lengths of upper density 1. Consider a 
modification of this example where qk = 2 for each k and each pair AA (also BB, 
etc.) is substituted by AA {BB, etc.), where A is the "mirror" of A, i.e., with the 
symbols are replaced by 1 and vice versa. It is not very hard to compute, that 
such process (although has entropy zero), has the same limit law properties as the 
independent process: almost sure convergence along the full sequence (n) to the 
unbiased (exponential) limit law. 

Remark 4- It is not hard to construct zero entropy processes with persistent mixed 
behavior. For example, applying the "stuttering technique" to an odometer one 
obtains a process in which a typical block B occurs in periodically repeated pairs: 

BB BB BB , i.e., with the function Gb ~ min{l, |} (which reveals 

attracting with intensity log ^~ 1 at t\ — log 2 and repelling with intensity e~ 2 at 
t 2 = 2). We skip the details. 

Questions 

Question 1. Is there a speed of the convergence to zero of the joint measure of the 
"bad" blocks in Theorem 1? More precisely, does there exist a positive function 
s(n, e, #"P) converging to zero as n grows, such that if for some e and infinitely 
many n's, the joint measure of the n-blocks which repel with intensity e exceeds 
s(n, e, #"P), then the process has necessarily entropy zero? (By the Example 1, ^ 
is not enough.) 

Question 2. Can one strengthen the Theorem 2 as follows: 

limsup F x , n < 1 — e~* ^-almost everywhere? 

n^oo 

Question 3. In Lemma 3, can one obtain V r conditionally /3-independent of jointly 
the past and all return times (k > 1) (for sufficiently large n, with /^-tolerance 
f3 for B e V~ n )7 In other words, can the /3-independent process (("P^) 2 , /is, o\b) 
be obtained /3-indepcndent of the factor-process generated by the partition into B 
and its complement? 

Question 4- (suggested by J-P. Thouvenot) Find a purely combinatorial proof of 
Theorem 1, by counting the quantity of very long strings (of length m) inside 
which a positive fraction (in measure) of all n-blocks repel with a fixed intensity. 
For sufficiently large n this quantity should be eventually (as m — > oo) smaller than 
h m for any preassigned positive h. 

Question 5. As we have mentioned, we only know about conditions which ensure 
that the limit law for the return time is exponential. It would be interesting to find 
a (large) class of positive entropy processes for which the distributions of return 
times are essentially deviated from exponential for bounded away from zero in 
measure collections of arbitrarily long blocks, i.e., a class of processes with persistent 
attracting. Can one prove that persistent attracting is, in some reasonable sense, 
a "typical" property in positive entropy, or that for a fixed measure-preserving 
transformation with positive entropy, a "typical" generator (partition) leads to 
persistent attracting? 
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